Subset Labeled LDA for Large-Scale Multi-Label Classification

نویسندگان

  • Yannis Papanikolaou
  • Grigorios Tsoumakas
چکیده

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-ofthe-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior–LDA, Dep–LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

In text classification task one of the main problems is to choose which features give the best results. Various features can be used like words, n-grams, syntactic n-grams of various types (POS tags, dependency relations, mixed, etc.), or a combinations of these features can be considered. Also, algorithms for dimensionality reduction of these sets of features can be applied, like Latent Dirich...

متن کامل

Feature extraction of hyperspectral images using boundary semi-labeled samples and hybrid criterion

Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it is not sufficiently flexible to cope with the multi-modal distributed data. We propose a new fea...

متن کامل

Multi-label Linear Discriminant Analysis

Multi-label problems arise frequently in image and video annotations, and many other related applications such as multi-topic text categorization, music classification, etc. Like other computer vision tasks, multi-label image and video annotations also suffer from the difficulty of high dimensionality because images often have a large number of features. Linear discriminant analysis (LDA) is a ...

متن کامل

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vi...

متن کامل

Capturing correlations of multiple labels: A generative probabilistic model for multi-label learning

Recent years have witnessed a considerable surge of interest in the multi-label learning problem. It has been shown that a key factor for a successful multi-label learning algorithm is to effectively exploit relations between labels. However, most of the previous work exploiting label relations focuses on pairwise relations. To handle the situations where there are intrinsic correlations among ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1709.05480  شماره 

صفحات  -

تاریخ انتشار 2017